Code
library(rvest)
library(tidyverse)Tony Duan
find table xpath
# A tibble: 6 × 11
  `Driver name`     Nationality    `Seasons competed` `Drivers' Championships`
  <chr>             <chr>          <chr>              <chr>                   
1 Carlo Abate       Italy          1962–1963          0                       
2 George Abecassis  United Kingdom 1951–1952          0                       
3 Kenny Acheson     United Kingdom 1983, 1985         0                       
4 Andrea de Adamich Italy          1968, 1970–1973    0                       
5 Philippe Adams    Belgium        1994               0                       
6 Walt Ader         United States  1950               0                       
# ℹ 7 more variables: `Race entries` <chr>, `Race starts` <chr>,
#   `Pole positions` <chr>, `Race wins` <chr>, Podiums <chr>,
#   `Fastest laps` <chr>, `Points[a]` <chr>find table xpath
# A tibble: 6 × 7
  Country     Totaldrivers Champions Championships `Race wins` `First driver(s)`
  <chr>       <chr>        <chr>     <chr>         <chr>       <chr>            
1 Argentinad… 26           1(Fangio… 5(1951, 1954… "38\n(Fang… Juan Manuel Fang…
2 Australiad… 18           2(Brabha… 4(1959, 1960… "45\n(Brab… Tony Gaze(1952 B…
3 Austriadet… 16           2(Rindt,… 4(1970, 1975… "41\n(Rind… Jochen Rindt(196…
4 Belgiumdet… 25           0         0             "11\n(Ickx… Johnny Claes(195…
5 Brazildeta… 32           3(Fittip… 8(1972, 1974… "101\n(Fit… Chico Landi(1951…
6 Canadadeta… 15           1(J. Vil… 1(1997)       "17\n(G. V… Peter Ryan(1961 …
# ℹ 1 more variable: `Most recent driver(s)/Current driver(s)` <chr>R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.3.1
Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Asia/Shanghai
tzcode source: internal
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] lubridate_1.9.4 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4    
 [5] purrr_1.0.4     readr_2.1.5     tidyr_1.3.1     tibble_3.2.1   
 [9] ggplot2_3.5.1   tidyverse_2.0.0 rvest_1.0.4    
loaded via a namespace (and not attached):
 [1] utf8_1.2.4        generics_0.1.3    xml2_1.3.7        stringi_1.8.4    
 [5] hms_1.1.3         digest_0.6.37     magrittr_2.0.3    evaluate_1.0.3   
 [9] grid_4.4.1        timechange_0.3.0  fastmap_1.2.0     jsonlite_1.9.1   
[13] processx_3.8.6    chromote_0.4.0    ps_1.9.0          promises_1.3.2   
[17] httr_1.4.7        selectr_0.4-2     scales_1.3.0      cli_3.6.4        
[21] rlang_1.1.5       munsell_0.5.1     withr_3.0.2       yaml_2.3.10      
[25] tools_4.4.1       tzdb_0.4.0        colorspace_2.1-1  curl_6.2.1       
[29] vctrs_0.6.5       R6_2.6.1          lifecycle_1.0.4   htmlwidgets_1.6.4
[33] pkgconfig_2.0.3   pillar_1.10.1     later_1.4.1       gtable_0.3.6     
[37] glue_1.8.0        Rcpp_1.0.14       xfun_0.51         tidyselect_1.2.1 
[41] rstudioapi_0.17.1 knitr_1.49        htmltools_0.5.8.1 websocket_1.4.2  
[45] rmarkdown_2.29    compiler_4.4.1   [1] "Welcome to WHISKYBASEIn the last 10 years whiskybase has been building into the platform that we are right now. It started as a small project by Menno but has become the main resource for whiskies. In this anniversary year we have been changing a lot within Whiskybase. This new release with a complete new design is the biggest release so far.There are lot's of new functionalities to be discovered. We are proud of the result but also have big plans for the coming years. And the past months have inspired us to build out Whiskybase even moreWe hope you will be with us for a long time and welcome new members to Whiskybase."https://r4ds.hadley.nz/webscraping.html
https://rvest.tidyverse.org/reference/read_html_live.html
---
title: "web scrap with rvest"
author: "Tony Duan"
execute:
  warning: false
  error: false
format:
  html:
    toc: true
    toc-location: right
    code-fold: show
    code-tools: true
    number-sections: true
    code-block-bg: true
    code-block-border-left: "#31BAE9"
---
# loal pacakge
```{r}
library(rvest)
library(tidyverse)
```
# read html
```{r}
url='https://www.r-project.org/'
page=read_html(url)
```
# get HTML text
```{r}
page %>%html_element(css = "h1") |> html_text(trim = TRUE)
```
# get HTML link
```{r}
page %>%html_element(css = "strong a") |> html_text(trim = TRUE)
```
```{r}
page %>%html_element(css = "strong a") |> html_attr("href")
```
# get table
```{r}
url='https://en.wikipedia.org/wiki/List_of_Formula_One_drivers'
page=read_html(url)
```
## get 3rd table
find table xpath
```{r}
table=page %>%html_element(xpath = '//*[@id="mw-content-text"]/div[1]/table[3]') |> html_table()
table |> head()
```
## get 4th table
find table xpath
```{r}
table=page %>%html_element(xpath = '//*[@id="mw-content-text"]/div[1]/table[4]') |> html_table()
table |> head()
```
```{r, attr.output='.details summary="sessionInfo()"'}
sessionInfo()
```
# using read_html_live() with more advance web scraping
```{r}
library(rvest)
library(tidyverse)
```
```{r}
url="https://www.whiskybase.com/whiskies/"
web <- read_html_live(url)
```
```{r}
#| eval: false
web$view()
```
```{r}
intro_text=web %>% html_elements(".widget-article-content") |> html_text(trim = TRUE) 
intro_text
```
# Reference
https://r4ds.hadley.nz/webscraping.html
https://rvest.tidyverse.org/reference/read_html_live.html